121 research outputs found
Cis-regulatory module detection using constraint programming
We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques
Direct mining of subjectively interesting relational patterns
Data is typically complex and relational. Therefore, the development of relational data mining methods is an increasingly active topic of research. Recent work has resulted in new formalisations of patterns in relational data and in a way to quantify their interestingness in a subjective manner, taking into account the data analyst's prior beliefs about the data. Yet, a scalable algorithm to find such most interesting patterns is lacking. We introduce a new algorithm based on two notions: (1) the use of Constraint Programming, which results in a notably shorter development time, faster runtimes, and more flexibility for extensions such as branch-and-bound search, and (2), the direct search for the most interesting patterns only, instead of exhaustive enumeration of patterns before ranking them. Through empirical evaluation, we find that our novel bounds yield speedups up to several orders of magnitude, especially on dense data with a simple schema. This makes it possible to mine the most subjectively-interesting relational patterns present in databases where this was previously impractical or impossible
Leveraging Contextual Information for Robustness in Vehicle Routing Problems
We investigate the benefit of using contextual information in data-driven
demand predictions to solve the robust capacitated vehicle routing problem with
time windows. Instead of estimating the demand distribution or its mean, we
introduce contextual machine learning models that predict demand quantiles even
when the number of historical observations for some or all customers is
limited. We investigate the use of such predicted quantiles to make routing
decisions, comparing deterministic with robust optimization models.
Furthermore, we evaluate the efficiency and robustness of the decisions
obtained, both using exact or heuristic methods to solve the optimization
models. Our extensive computational experiments show that using a robust
optimization model and predicting multiple quantiles is promising when
substantial historical data is available. In scenarios with a limited demand
history, using a deterministic model with just a single quantile exhibits
greater potential. Interestingly, our results also indicate that the use of
appropriate quantile demand values within a deterministic model results in
solutions with robustness levels comparable to those of robust models. This is
important because, in most applications, practitioners use deterministic models
as the industry standard, even in an uncertain environment. Furthermore, as
they present fewer computational challenges and require only a single demand
value prediction, deterministic models paired with an appropriate machine
learning model hold the potential for robust decision-making
Data Driven VRP: A Neural Network Model to Learn Hidden Preferences for VRP
The traditional Capacitated Vehicle Routing Problem (CVRP) minimizes the total distance of the routes under the capacity constraints of the vehicles. But more often, the objective involves multiple criteria including not only the total distance of the tour but also other factors such as travel costs, travel time, and fuel consumption. Moreover, in reality, there are numerous implicit preferences ingrained in the minds of the route planners and the drivers. Drivers, for instance, have familiarity with certain neighborhoods and knowledge of the state of roads, and often consider the best places for rest and lunch breaks. This knowledge is difficult to formulate and balance when operational routing decisions have to be made.
This motivates us to learn the implicit preferences from past solutions and to incorporate these learned preferences in the optimization process. These preferences are in the form of arc probabilities, i.e., the more preferred a route is, the higher is the joint probability. The novelty of this work is the use of a neural network model to estimate the arc probabilities, which allows for additional features and automatic parameter estimation. This first requires identifying suitable features, neural architectures and loss functions, taking into account that there is typically few data available. We investigate the difference with a prior weighted Markov counting approach, and study the applicability of neural networks in this setting
Efficiently Explaining CSPs with Unsatisfiable Subset Optimization (extended algorithms and examples)
We build on a recently proposed method for stepwise explaining solutions of
Constraint Satisfaction Problems (CSP) in a human-understandable way. An
explanation here is a sequence of simple inference steps where simplicity is
quantified using a cost function. The algorithms for explanation generation
rely on extracting Minimal Unsatisfiable Subsets (MUS) of a derived
unsatisfiable formula, exploiting a one-to-one correspondence between so-called
non-redundant explanations and MUSs. However, MUS extraction algorithms do not
provide any guarantee of subset minimality or optimality with respect to a
given cost function. Therefore, we build on these formal foundations and tackle
the main points of improvement, namely how to generate explanations efficiently
that are provably optimal (with respect to the given cost metric). For that, we
developed (1) a hitting set-based algorithm for finding the optimal constrained
unsatisfiable subsets; (2) a method for re-using relevant information over
multiple algorithm calls; and (3) methods exploiting domain-specific
information to speed up the explanation sequence generation. We experimentally
validated our algorithms on a large number of CSP problems. We found that our
algorithms outperform the MUS approach in terms of explanation quality and
computational time (on average up to 56 % faster than a standard MUS approach).Comment: arXiv admin note: text overlap with arXiv:2105.1176
Knowledge Refactoring for Inductive Program Synthesis
Humans constantly restructure knowledge to use it more efficiently. Our goal
is to give a machine learning system similar abilities so that it can learn
more efficiently. We introduce the \textit{knowledge refactoring} problem,
where the goal is to restructure a learner's knowledge base to reduce its size
and to minimise redundancy in it. We focus on inductive logic programming,
where the knowledge base is a logic program. We introduce Knorf, a system which
solves the refactoring problem using constraint optimisation. We evaluate our
approach on two program induction domains: real-world string transformations
and building Lego structures. Our experiments show that learning from
refactored knowledge can improve predictive accuracies fourfold and reduce
learning times by half.Comment: 7 pages, 6 figure
- …